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ABSTRACT The fission yeast Schizosaccharomyces pombe has been widely used to study eukaryotic cell 
biology, but almost all of this work has used derivatives of a single strain. We have studied 81 independent 
natural isolates and 3 designated laboratory strains of Schizosaccharomyces pombe. Schizosaccharomyces 
pombe varies significantly in size but shows only limited variation in proliferation in different environments 
compared with Saccharomyces cerevisiae. Nucleotide diversity, it, at a near neutral site, the central core of 
the centromere of chromosome II is approximately 0.7%. Approximately 20% of the isolates showed 
karyotypic rearrangements as detected by pulsed field gel electrophoresis and filter hybridization analysis. 
One translocation, found in 6 different isolates, including the type strain, has a geographically widespread 
distribution and a unique haplotype and may be a marker of an incipient speciation event. All of the other 
translocations are unique. Exploitation of this karyotypic diversity may cast new light on both the biology of 
telomeres and centromeres and on isolating mechanisms in single-celled eukaryotes. 
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Variation segregating in natural populations of model organisms is 
now widely studied to address questions about the populations struc- 
ture (Ruderfer et al. 2006; Tsai et al. 2008), geographic differentiation 
(Johnson et al. 2004), and phylogeny, as well as to provide new 
sources of variation for the investigation of traits that were previously 
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studied using either experimentally induced mutations or a small set 
of natural variants (Nieduszynski and Liti 2011). In the case of the 
budding yeast Saccharomyces cerevisiae, variation has, on one hand, 
been used to establish the origins and relatedness of the many yeast 
strains (Fay and Benavides 2005; Schacherer et al. 2009; Liti et al. 
2009) that have been exploited by humans and, on the other, to 
identify new components of pathways and processes that have been 
exhaustively studied by laboratory methods (Torabi and Kruglyak 
2011; Parts et al. 2011). In contrast; variation in Arabidopsis thaliana 
(Nordborg and Weigel 2008) has been studied to understand the 
population structure of a predominantly self-fertilizing plant (Kim 
et al. 2007; Fournier- Level et al. 2011), to identify variation in traits 
of fundamental and agricultural interest, such as disease resistance 
(Atwell et al. 2010; Nemri et al. 2010); and to serve as an ecological 
model that allows a unified understanding of trait variation at the 
population and nucleotide levels (Todesco et al. 2010). 
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The fission yeast Schizosaccharomyces pombe (Egel 2003) is a pow- 
erful complement to S. cerevisiae for the study of eukaryotic cell 
autonomous processes. S. pombe was originally isolated from East 
African beer, but subsequently it has been found in many parts of 
the world in indigenous fermentations, fruit, molasses, and industrial 
glucose. There have been two large efforts to isolate S. pombe from 
fruit, nectar, or fermentations: one by Florenzano et al (1977) in the 
vineyards of Western Sicily, and the other by Gomes et al (2002) in 
four regions of Southeast Brazil. Thus, while S. pombe has played only 
a minor role in biotechnology in comparison to S. cerevisiae, isolates 
of S. pombe are present in many of the major yeast strain collections. 
Although S. cerevisiae was intensively studied by brewers, oenologists, 
and bakers prior to its exploitation as a laboratory model, S. pombe 
has only been studied in any depth as a model of eukaryotic cell 
biology. Almost all of this work exploits derivatives of the strain 
968, which was first identified in a French wine by Osterwalder and 
then developed as a genetic system by Leupold (1950) (Jurg Kohli, 
personal communication). Little is known, therefore, about the extent 
or nature of the variation of this yeast in nature. We assembled 
a collection of 81 isolates from many regions of the world. We ana- 
lyzed the isolates at both the phenotypic, genotypic, and karyotypic 
level. We identified inherited variation in cell size but only limited 
variability in the proliferative ability in various environments. There 
are extensive karyotypic differences between many of the strains. The 
level of nucleotide variation, it, at neutral sites is about 0.7%, which 
is higher than S. cerevisiae (Liti et al 2009). Our data suggest that 
S. pombe exists in small, incompletely isolated populations and that 
these occupy a limited range of environments. Although mechanistic 
analysis of the phenotypic diversity of S. pombe will require structural 
analysis of the different karyotypes, the diverse karyotypes may 
themselves provide new insights into centromere and telomere func- 
tion, isolation mechanisms, and speciation in single-celled eukaryotes 
(Gordon et al 2011). 



MATERIALS AND METHODS 
Handling of strains 

Isolates typically arrived on agar slopes or as freeze-dried samples. If 
they were freeze dried, then the yeast was reconstituted with water and 
streaked onto supplemented yeast extract agar plates; (YES; 5% yeast 
extract, 3% glucose, 225 mg/L histidine, 225 mg/L adenine, 225 mg/L 
leucine, and 225 mg/L uracil), 2% agar Bacto agar (Becton Dickinson). 
Strains were maintained either on YES agar or cultured in liquid YES. 

Restriction site mapping and PCR 

Filter transfer, hybridization analysis, and pulsed field gel electropho- 
resis were carried out as previously described (Brown 1988; Brown 
et al 1990). PCR was carried using Taq polymerase (homemade or 
from Yorkshire Biosciences). Sanger sequencing was carried out using 
BigDye v3.1 (Applied Biosystems). Primers used to construct probes 
for filter hybridization are given in supporting information, Table S3. 

DNA extraction, sequencing, and analysis 

DNA for PCR was extracted from the 5 mL of yeast cultures using 
a protocol kindly supplied by Jacob Dalgaard of the Marie Curie 
Research Institute. Five milliliters of saturated culture was concen- 
trated by centrifugation; spheroplasted using zymolyase 20T in 100 ul 
1M sorbitol and 50 mM EDTA; concentrated by centrifugation once 
again; resuspended in 0.2 mL of DNAzol; and vortex mixed. The 
DNA was precipitated with an equal volume of cold ethanol. The 



crude DNA was treated with ribonuclease and then pronase in 
10 mM Tris-HCl (pH 8.0), 1 mM EDTA, and 0.1% SDS; extracted 
between three and five times with a 1:1 mixture of phenol and chloro- 
form; and finally precipitated with ethanol prior to use. Primers used 
to amplify and sequence DNA for diversity analysis are listed in Table 
S4. 

DNA was amplified with Taq polymerase in an ammonium 
chloride buffer containing 2 mM MgCl 2 using the following condi- 
tions: an initial denaturation step of 92° for 30 sec was followed by 33 
cycles of 92° for 10 sec, 56° for 10 sec, and 65° for 2 min. Primers and 
unincorporated dNTP were removed from the reactions using 
Ampure (Agencourt, Beckman Coulter), the products were sequenced 
on each strand using the primers described above using BigDye (v2 or 
v3.1; Applied Biosystems), and then purified prior to electrophoretic 
analysis by Cleanseq (Agencourt, Beckman Coulter). Sequences were 
aligned and edited using Bioedit and collapsed into haplotypes using 
FaBox (http://www.birc.au.dk/~biopv/php/fabox/). Sequences that de- 
fined unique haplotypes at any one locus were reamplified and rese- 
quenced. Standard summary statistics were extracted with Arlequin 
(Excoffier et al 2005) and DNASp (Librado and Rozas 2009). DNA 
for pulsed field gel electrophoresis was embedded in agarose plugs and 
extracted as described (Smith et al 1987). 

Microarray analysis 

For microarray analysis, chromosomal DNA was size-fractionated by 
pulsed field gel electrophoresis, electroeluted from the gel into dialysis 
tubing, concentrated using Butan-2-ol, amplified by the Qiagen 
REPLl-g mini kit before labeling using the Agilent DNA ULS labeling 
kit (5190-0419), and then purified and hybridized to the Agilent 
S. pombe 4 x 44K ChlP-on-chip array (G4810) using unfractionated 
CRUK 972 DNA as competitor. In some experiments, unfractionated 
DNA from natural isolates was used as target. All steps were carried 
out according to the manufacturer's instructions. Arrays were scanned 
using an Agilent Scanner, and the data were analyzed using the Agi- 
lent Genomic Workbench v5.0.14. 

Quantifying natural trait variation 

Strains were subjected to high-throughput phenotyping by micro- 
cultivation (n = 2) in an array of environments essentially as de- 
scribed (Warringer et al 2008; Warringer and Blomberg 2003). 
Briefly, strains were inoculated in 350 ul of YES medium (5% yeast 
extract, 3% glucose, 225 mg/L histidine, 225 mg/L adenine, 225 mg/L 
leucine, and 225 mg/L uracil) and incubated in two serial rounds of 
precultivation for 48-72 h at 30°. For experimental runs, strains were 
inoculated to an OD of 0.05-0.1 in 350 |xl of YES medium (3% 
glucose was replaced by 3% of alternative carbon sources where in- 
dicated) and microcultivated for 48 h or 72 h in a Bioscreen analyzer 
C (Oy Growth Curves, Finland). Optical density was measured every 
20 min using a wide band (450-580 nm) filter. The mitotic prolifer- 
ation rate (population doubling time), lag (population adaptation 
time), and efficiency (total change in population density) were 
extracted from high-density growth curves and LN (natural loga- 
rithm) -transformed (data set in Table S5). Relative fitness variable 
for each strain and trait, LSCz/, was calculated as: 



1^10 



wtL 



-log 



where wt k j is the fitness variable of the k th measurement of the wild- 
type for trait j; Xy is the measure of strain i for trait j; and r indicates 
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the run. The measure for proliferation efficiency was inverted to 
maintain directionality between fitness components. 

Cell size at septation 

Cells were cultured over night in YES containing glucose at 0.5% 
weight per volume until they reached mid-log phase, concentrated by 
centrifugation, and visualized using phase microscopy using a Zeiss 
Axioscop microscope with a Plan NeoFluor 40x objective. Images 
were collected and sizes were quantitated using Metamorph (Univer- 
sal Imaging) v6.1. 

Population structure revealed by F st analysis of 
concatenated sequences of noncoding DNA 

The SPBC660.16 large intron (which encodes phosphogluconate 
dehydrogease), TER1, and CEN2 sequences were stripped of indels 
and, in the case of the large intron of SPBC 660.16, the microsatellite; 
concatenated; and then grouped according to geographic origin. F st 
values (Cockerham and Weir 1984) were estimated using the Arlequin 
package (Excoffler et al 2005). 

Linkage disequilibrium calculation 

Informative SNPs were identified both in the centromere of chro- 
mosome 2 (Figure S6) and in the flanking DNA (Figure S7) and in the 
TER1 gene and in the flanking DNA (Figure S8) and used to estimate 
linkage disequilibrium (LD) statistics using DNAsp (Table 5). 

RESULTS 

A collection of natural isolates of S. pombe 

We started our project by assembling a collection of 84 S. pombe 
strains (Figure 1, Table 1, and Table SI). Of these strains, 81 were 
natural isolates and 3 were listed as laboratory strains. We first 
checked the ploidy by DNA staining with Cytox green and fluores- 
cence flow cytometry; this showed that all of the strains but one 
(NCYC 2355) were haploid. We therefore subcloned NCYC 2355 
and isolated three haploid clones. We analyzed two of these clones 
using the methods described below; they were identical, so we refer to 
one: NCYC 2355-1. We characterized the 84 haploid strains by se- 
quence and phenotype. We initially sequenced three segments of the 



genome that did not code for proteins: a segment of the central core of 
chromosome II (II: 1,621,085-1,621,800); the gene encoding the RNA 
component of telomerase, TER1 (I: 3,084,446-3,086,143); and the 
second intron (the largest intron in the S. pombe genome, II: 
230,740-231,501) of gene SPBC660.16-1, which encodes phospho- 
gluconate dehydrogenase. To establish a preliminary estimate of the 
levels of outcrossing (see below), we also sequenced two loci that 
flanked the centromere of chromosome II (II: 1,572,330-1,572,988 
and 1,658,033-1,658,751) and two loci centromere distal of the 
TER1 gene (I: 3,113,975-3,114,540 and I: 3,194,538-3,195,201). In 
total, we sequenced and analyzed 5,777 bp in 84 strains. The results 
established that many of the isolates were closely related and that the 
entire collection could be reduced to 40 identical haplotypes (Table 1). 
The strains with identical haplotypes either had a widespread distri- 
bution or were collected close to one another. Haplotypes with a wide- 
spread distribution could have spread around the world by natural 
mechanisms or as result of human activity. The geographically re- 
stricted haplotypes may represent clones that had reached a high 
frequency because of founder effects or population bottlenecks, or it 
may have arisen from restricted collection activity. 

Population genetics of S. pombe 

We used sequence data to analyze the diversity and population 
structure of S. pombe. To minimize the consequences of biased col- 
lecting activities or human trade, we carried out this analysis using the 
diversity contained within the 40 different haplotypes identified in 
Table 1 and Table S2. The diversity it (Table 2) detected at each of 
the three nonprotein- coding loci varies with the highest value being 
observed at the locus most likely to approach neutrality, the central 
core of the centromere of chromosome II. A negative value of Tajima's 
D suggests recent selection at or close to the region of the SPBC 660.16 
intron. The value of it of 7 x 10~ 3 seen at the central core of chro- 
mosome II is consistent with the variation seen in pairwise compar- 
ison of 4-fold degenerate sites between the laboratory strain and 
strains NCYC 132 and SPK 1820, which gave values of 8.9 x 10" 3 
and 6.7 x 10~ 3 (Rhind et al 2011), respectively, so we concluded that 
this estimate is correct. This is slightly higher than in the budding 
yeast S. cerevisiae, which has a value of 5.65 x 10~ 3 (Liti et al 2009). 
The fact that all but one of the strains are haploid and the assumption 




Figure 1 Geographic origin of 
the 84 strains of Schizosacchar- 
omyes pombe used in this 
study. The area of the circles is 
proportionate to the numbers 
of strains from the respective 
areas. 
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Table 1 Assorting 84 strains of S. pombe into 40 groups with shared haplotypes at seven loci 



Strain 



Haplotype Number 



Origin Where Known 



UWOPS 92.229.4 
UWOPS 94.422.2 

UFMG A529, UFMG 790, UFMG A826 

UFMG R416, UFMG R418, UFMG R420, UFMG R424, UFMG R435 
UFMG R427 
UFMG A1263 

UFMG A521, UFMG A571, UFMG A602 

UFMG A1000, UFMG A1153, 

UFMG R434 

UFMG R428 

UFMG A1152 

UFMG R437 

UFMG A738 

NCYC 683, NCYC 2387, DBVPG4435, AWRI 442 
NCYC 936, NCYC 2355-1, 

CBS 356, NCYC 132, NCYC535, DBVPG2817, DBVPG4437, AWRI 141 

NCYC 380, CBS 1063, DBVPG 6281, CBS 355, DBVPG 6417 

DBVPG 4433, DBVPG 6279, DBVPG 6610, DBVPG 6699, Y0036, Y0037, 

CRUK 972, CRUK 975, Y 468, Y 469, 
CBS 2628 

CBS 2775, CBS 2776, CBS 2777 

CBS 5680 

CBS 5682 

CBS 7335 

DBVPG 2801 

DBVPG 2805 

DBVPG 2804, DBVPG 2806, DBVPG 2807, DBVPG 2808, DBVPG 2809 
DBVPG 2810 

DBVPG 2811, DBVPG 2812, DBVPG 2814, DBVPG 2815, DBVPG 2816, 

DBVPG 2818 
Y470 

Y 831, Y 832 
CBS 374 

DBVPG 6447, DBVPG 6449 

CBS 358 

CBS 1058 

CBS 357 

CBS 352 

CBS 1057 

CBS 1059 

CBS 1044 

L2470 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 
11 
12 
13 
14 
15 
16 
17 
18 

19 
20 
21 
22 
23 
24 
25 
26 
27 
28 

29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 



Mexico 
Mexico 

Brazil, Belo Horizonte and Vicosa 

Brazil; Aracaju 

Brazil; Aracaju 

Brazil, Vigosa 

Brazil, Belo Horizonte 

Brazil, Belo Horizonte and Salinas 

Brazil; Aracaju 

Brazil; Aracaju 

Brazil, Salinas 

Brazil; Aracaju 

Brazil, Belo Horizonte 

Spain, Italy, South Australia 

Sri Lanka, Japan 

Eastern Mediterranean, Africa, Italy, Australia 
Sicily, Spain 

Germany, Indo-China, South Africa, France, 

Pakistan 
Japan (all) 
Poland 
South Africa 
Spain 
Tunisia 
Malta 
Malta (all) 
Malta 
Sicily (all) 



South Africa (both) 
Delft 



Java 

Jamaica 

Indonesia 

Sweden 

Mauritius 



Chile 



Each of the 84 strains in the original collection was sequenced using conventional Sanger methodology at seven individual loci as described in the text and in File S1 . 
Strains were grouped according to 40 compound haplotypes. For the details of the sequences of the individual loci and the haplotype structures, see File S1. 



of a neutral coalescent allows estimation of the effective mitotic pop- 
ulation size N e by use of relationship tt = 2N e u, where u is the mu- 
tation rate per nucleotide per mitotic generation (Tsai et al. 2008). The 
neutral mutation rate in S. pombe is not known, so we assumed the 
rate that has been used for S. cerevisiae of 0.33 x 10~ 9 /bp/generation 
(Lynch et al 2008), leading to an estimate of the global effective 



population size of 1 x 10 7 . It is also of interest to know how much 
variation is endemic to particular population. There was too little 
sequence variation to be able to estimate population structure a priori 
using the Bayesian approaches implemented in the STRUCTURE 
(Pritchard et al 2000) and BAPS (Corander et al 2008) packages, 
so we grouped the haplotypes according to their geographic origin, 



■ Table 2 Nucleotide diversity at three noncoding loci in the genome of S. pombe 



Sequence 


Number of 
Residues 


7T ± SD X 10" 3 


9 ± SDx 10" 3 


Tajima's D 
(per Sequence) 


Number of 
Segregating Sites 


CC CEN2-indels 


719 


6.997 ± 3.863 


6.866 ± 2.520 


0.01131 


21 


TER1-indels 


1702 


4.329 ± 2.298 


4.420 ± 1.472 


-0.08316 


32 


660.16 intron-minisat 


743 


1.603 ±1.165 


3.798 ± 1.51 


-1.77965 


12 


Diversity statistics were 


calculated based on the 


sequences at the three 


indicated loci using Arlequin. 


The sequences were cleared of 


indels and rmicrosatellite 



sequences prior to analysis. 
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Table 3 Population differentiation as measured by F s1 



Pnni j lotion 

1 UUUIULIUI 1 


American 


African 


Fi jrnnp^in 

i — ui uuuai i 


Asian 


Unknown 


American (15) 


0 










African (5) 


0.075 


0 








European (1 1) 


0.087 


-0.021 


0 






Asian (7) 


0.009 


0.02 


0.035 


0 




Unknown origin (9) 


0.044 


-0.01 


0.051 


0.022 


0 



The sequence data from the 40 compound haplotypes were used to estimate 
pairwise F st measures of population differentiation using the Arlequin package. 
Haplotypes (e.g. haplotypes 16 and 18) containing strains of different origins 
were considered as being of unknown origin. The numbers of strains in the 
respective populations are indicated in parentheses. Haplotypes containing 
strains of different known origins were assigned multiply to different populations 
and thus the sum of the numbers in brackets exceeds 40. 



where this was possible, and then analyzed them using the F statistic 
(Cockerham and Weir 1984). These results (Table 3) demonstrated 
little by way of population structure but suggested that the American 
strains were the most highly differentiated. Network analysis of the 
CEN2, TER, and 660.16 sequences was also consistent with extensive 
mixing between the different haplotypes (Figure S6, Figure S8, Figure 
S9, and Figure S10). Two loci include sufficient data to estimate the 
extent of linkage disequilibrium, which is inversely proportional to the 
out-crossing rate. To do this, we identified informative SNPs in and 
around the centromere of chromosome II and within and centromere 
distal of the TER1 gene (Figure S6, Figure S7, Figure S8, Figure S9, and 
Figure S10). A four gamete test of the sequence variation around the 
centromere does not indicate recombination. Significant linkage dis- 
equilibrium is also detectable 30 kb from the telomerase RNA gene 
using both the D' and R statistics (Table 4). The extent of linkage 
disequilibrium would seem, therefore, to be greater than in S. para- 



doxus (Tsai et al 2008), which out-crosses once in every 1000 asexual 
generations and which shows lower levels of nucleotide diversity. 
Network analysis (Figure S6, Figure S9, and Figure S10) of the hap- 
lotypes at centromere II, TER1, and SPBC660.16, however, demon- 
strate that there is extensive mixing between different strains. More 
detailed analysis will be required to measure the rate of out-crossing in 
S. pombe and to correlate it with the karyotypic diversity described 
below. 

Trait variation in S. pombe is defined by population 
structure and geographic boundaries 

To survey natural trait variation in S. pombe, isolates were subjected to 
high-resolution quantification of proliferative ability in an array of 
environments representing variations in nutrient availability, temper- 
ature, and exposure to toxic metals and drugs. From high-density 
mitotic growth curves, the fitness components lag of proliferation, 
rate of proliferation (population doubling time), and efficiency of pro- 
liferation (population density change) were extracted, providing >120 
distinct measures of organism- environment interactions (Figure 2). 
These fitness components were compared with those of the S. cerevi- 
siae universal type strain BY4741. In optimal conditions, S. pombe 
featured a delay in the time to initiate population growth and a mar- 
ginal reduction in proliferation rate relative S. cerevisiae (Figure SI, 
A and B). In contrast, the performance of S. pombe was very severely 
impaired in many stress-inducing environments (Figure 2B). In par- 
ticular, S. pombe proliferation in the presence of alkali and alkaline 
earth metals, DNA damaging agents, and respiratory or partially re- 
spiratory carbon sources was reduced (Figure 2B and Figure SID). 
Utilization of maltose, the main storage carbohydrate of barley, rep- 
resented an exception, being more efficient in S. pombe (Figure SI, 
D and E). Overall, trait variation in S. pombe was lower than what has 



Table 4 Linkage disequilibrium around centromere 2 and telomerase RNA gene (TER1) 



Chromosome I 
R/chi 
1572349 
1621091 
1658392 
1658455 



centromere 



1572349 

6.271* 
14.235*** 
6.271* 



1621091 

-0.396 

17.622*** 

40*** 



1658392 

-0.597 
0.664 

17.622*** 



1658455 

-0.396 
1 

0.664 



D' 

1572349 
1621091 
1658392 
1658455 
Chromosome I TER1 region 
R/chi 
3084762 
3086123 
3114239 
3114431 
3194949 



1572349 



3084762 

12.31*** 
2.857 
7.519** 
2.462 



1621091 

-1 



3086123 

0.555 

4.596* 
2.222 
0.32 



1658392 

-1 



3114239 

0.267 
0.339 

28.9*** 
1.709 



1658455 

-1 
1 
1 



3114431 

0.434 
0.236 
0.85 



3194949 

-0.248 
-0.089 
-0.0207 
-0.333 



1.778 



D' 

3084762 
3086123 
3114239 
3114431 
3194949 



3084762 



3086123 

0.826 



3114239 

0.331 
0.407 



3114431 

0.457 
0.236 
1 



3194949 

-0.0373 
-0.2 
-0.385 
-0.333 



Haplotypes at individual loci were identified to identify informative SNPs, and these were used to measure linkage disequilibrium by the indicated metrics using the 
DNAsp package. The significance of the chi squared values are indicated by asterisks: *, at the 5% level; **, at the 1% level; ***, at the 0.1% level. The numbers in bold 
at the margins of the table refer to the coordinates along the chromosome of the respective, informative SNPs. 
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Pearson correlation Time (h) Time (h) 

Figure 2 Trait variation in S. pombe is defined by population structure and geographic origin. (A) The proliferative lag (time to initiate pro- 
liferation), proliferative rate (population doubling time), and proliferative efficiency (change in population density) were extracted from high- 
density growth curves (n = 2) of natural S. pombe isolates (N = 2) over 42 environments. (B) Hierarchical clustering of S. pombe natural isolates 
was performed using trait profiles based on all traits, a centered Pearson correlation metric, and average linkage mapping. Numbers indicate 
haplotype groups and color indicates geographic origin. Heat map depicts proliferation relative the S. cerevisiae universal type strain BY4741 
(Log2 [isolate/BY4741]). Green = inferior proliferation, red = superior proliferation, black = BY4741 proliferation, gray = missing data. The red 
arrow indicates the S. pombe reference strains 972 h— and 975 h+. (C) Pearson correlation coefficients were calculated between all pairs of strains 



620 I W. R. A. Brown et a/. 



.^5&G3' Genes | Genomes | Genetics 




r r r T T V Y T T V T T t T T T T ' r t T V 

S000)O«-CM'*imD00O'-CNtS0>C000S(NN 

ooO'-'-'-'-'-'-T-Nrorts^^ifiifiifiinin 
^^cooooocoooooco^cococo — 

CM CM CVJ CM CM CM CM > > >. 

OOOOOOOOOO m O O CO W CO CD C/> 

Q.CLQ_CLQ.Q_Q.Q.Q.CL (J 0- CL O 03 O O m 
>>>>>>>>>> >> ' 

00 CO CD CD CD DQ CD CD CD CD CD CO 

DQQQQQQQQQ QQ 



</) in - 

CO CO 

O O 



Figure 3 Variation in length at septation among natural isolates of S. pombe. S. pombe isolates were cultured to mid-exponential phase of 
growth, harvested, and then analyzed by phase contrast microscopy. The individual isolates are arranged on the abscissa according to their 
respective haplotypes, with isolates of the same haplotype placed adjacent to one another and accorded the same color (alternating gray or 
yellow). The red arrow indicates the S. pombe reference strains 972 h- and 975 h+. 



been reported both for the variable S. cerevisiae and its less variable 
wild relative S. paradoxus (Warringer et al 2011) (Figure S2A). This 
suggests that S. pombe is adapted for proliferating in a more con- 
strained ecological range than the baker's yeasts. Nevertheless, trait 
variation largely followed phylogenetic boundaries as denned by hap- 
lotype structure and geographic origin: strains with the same haplo- 
type or the same geographic origin were significantly (Student £-test, 
P < E— 34) more similar than other strains (Figure 2B-D). In addi- 
tion, we surveyed cell length at septation and found the influence of 
the haplotype structure to prevail for this trait (Figure 3). The most 
pheno typically distinct S. pombe populations were the American pop- 
ulation, with an impaired proliferation at high temperatures, and the 
European population, with a prolonged lag phase when initiating re- 
spiratory proliferation as well as elevated tolerance to DNA damage 
(Figure S2B). The strain CBS 2777 stands out in terms of mitotic 
fitness traits, and it emerges as one of the most atypical strains (Figure 
2E and Figure S3 A). CBS 2777 is a strain with the unusual karyotype 
of four chromosomes (see below). CBS 2777 deviated from the 
S. pombe mean in more than 14% of all traits (Student t-test, FDR 
5%). Most of these abnormalities represented severe proliferation defi- 
ciencies, notably the inability to tolerate DNA-damaging drugs, such 
as cisplatin and 4-nitroquinolone, that strain DNA replication and 



repair (Figure 2, G and F and Figure S3B). These abnormalities were 
evident also in relation to CBS2775 and CBS2776, which share hap- 
lotype, but not karyotype, structure with CBS2777. 

Extensive karyotypic diversity in S. pombe 

The laboratory strain of S. pombe possesses three chromosomes. Each 
centromere includes two blocks of tandemly repeated heterochromatic 
sequences that are arranged palindromically around an A+T rich 
central core. There is extensive sequence homology between the 
six blocks of tandemly repeated centromeric DNA, suggesting the 
possibility of polymorphism arising either from exchange between 
the arrays of repeats at different centromeres or from instability of 
individual centromeric palindromes. In light of this potential poly- 
morphism, we analyzed the chromosomes by pulsed field gel electro- 
phoresis, filter transfer, and hybridization using a set of single-copy 
probes that lie centromere proximal and distal on the six arms of the 
laboratory strain karyotype and with a probe for the ribosomal DNA 
(rDNA), which occupies a subtelomeric position at the ends of the 
two arms of chromosome III. Simple ethidium bromide staining of the 
pulsed field gels showed that four African isolates (Y468, Y470, Y831, 
and Y832) were mixtures of different karyotypes and were subcloned. 
One of the isolates (CBS 374), which gave ambiguous results upon 



belonging to the same and to different haplotype groups. Means and standard errors of the means are displayed. (D) Pearson correlation 
coefficients were calculated between all pairs of strains with similar and diverging geographic origins. Means and standard errors of the means are 
displayed. (E) A S. pombe mean trait profile was calculated and the similarity (Pearson correlation) between the mean trait profile and the trait 
profile of each individual isolate was calculated. Isolates were ranked according to degree of similarity; the bottom three (most atypical) S. pombe 
isolates are displayed. The most typical S. pombe isolate, Y 831 , is shown for comparison. (F, G) Proliferation of the S. pombe karyotype extreme 
CBS 2777 in presence of the DNA-damaging drugs cisplatin (F) and 4-nitroquinolone (G). Strains showing typical S. pombe behavior in these 
environments are included for comparison. 
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Figure 4 Rearrangement of the S. pombe karyotype by translocations of the rDNA and associated sequences. (A) DNA extracted from five 
different natural isolates (UWOPS 94.422.2, DBVPG 6281, DBVPG 6417, DBVPG 6449, and DBVPG 2805) and from laboratory strain CRUK 972 
was size-fractionated by pulsed field gel electrophoresis and then analyzed by filter hybridization with centromere proximal and distal probes from 
each of the six arms of the three chromosomes and with a probe derived from rDNA. The figure illustrates the original ethidium bromide-stained 
gels and the results of the filter hybridization. Four isolates had karyotypes in which the rDNA was rearranged. In UWOPS 94.422.2, the rDNA is 
translocated in its entirety onto chromosome I. In DBVPG 2805, the rDNA on the right arm of the laboratory strain chromosome III is translocated 
together with the distal MIR sequence onto chromosome I, and distal IR sequence is translocated onto chromosome III. In DBVPG6417, distal NIL 
sequences are translocated onto chromosome II, and distal MR sequences are translocated onto chromosome III. DBVPG 6281 appears to be 
a derivative of DBVPG 6417 as the rDNA is now present on chromosome I and distal IL sequences are present on chromosome II. (B) An 
ideogrammatic interpretation of these results together with an indication of the approximate positions of the probes used in the filter 
hybridization. 



first pulsed field gel analysis, was also subcloned. Together these sub- 
clones denned 15 additional strains, which we define as NOTT 133- 
145, NOTT 147, and NOTT 148. We subsequently sequenced these 
subclones at the most informative (as defined by number of haplo- 
types) of the seven loci that we had sequenced in the original set of 84 
strains (SPBC660.0.16, TER1, and the central core of the centromere 
of chromosome II), and we confirmed that they were isogenic with 
respect to the original uncloned isolates. The pulsed field gel and filter 
hybridization analysis identified three types of gross chromosome 
rearrangement. Four isolates (UWOPS 94.422.2, DBVPG 2805, 
DBVPG 6417, and DBVPG 6281) had karyotypes in which the rDNA 
was rearranged. The nature of these rearrangements was analyzed as 
shown in Figure 4A to produce the ideograms shown in Figure 4B. 
Each of these rearrangements was unique in the collection. The con- 
clusion we draw is that S. pombe shows extensive exchange of sub- 
telomeric DNA sequences, including rDNA. It is striking that the 
sequences that are exchanged between the chromosomes extend hun- 
dreds of kilobases into single-copy, chromosome-specific DNA. It is 
also notable that there is extensive size variation between chromo- 
somes not detectably involved in the translocation of the rDNA 
arrays. The cause of this variation remains to be determined but 



may include translocations and exchanges of subtelomeric DNA that 
our analyses failed to detect. The second type of rearrangement in- 
volved sequences normally present on chromosomes I and II of the 
laboratory strain. To facilitate discussion, we refer to these types of 
rearranged chromosomes using Arabic numerals, with the largest 
chromosome in any one strain termed chromosome 1. We identified 
six different rearrangements of this type. One of these rearrangements 
was present in what is referred to as the S. pombe type strain CBS 356 
(Figure 5 A) and in five other strains (NCYC 132, NCYC 535, DBVPG 
2817, DBVPG 4437, and AWRI 141) that have a widespread distri- 
bution over the old world and Australia (Table 1, Figure S4). These 
strains together defined haplotype 16, which includes specific locus 
haplotypes at the three loci in and around CEN2 and at the large 
intron of SPBC 660.16. The widespread distribution of the strains with 
this haplotype suggests that the karyotype variant is long standing 
relative to the other rearrangements, and thus, haplotype 16 karyotype 
may define an incipient species. The other translocations were 
detected in single strains (NOTT 138, NOTT 140, NOTT 142, NOTT 
143, and NOTT 145), all of which were derived by subcloning three 
African isolates. We analyzed all of these rearranged karyotypes first 
by pulsed field gel electrophoresis, then by filter transfer and 
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Table 5 Regions of CRUK 972, CBS 356 # CBS 2111, Nott 138 # 141, 143, and 145 showing copy number variation 



Str9in Chromosome 


Rearrangement 


Start Breakpoint 


Finish Breakpoint 




Comments 


NOTT 141 I 


Duplication 


3031645- 


-3031834 


3211591- 


-3211839 


180 




NOTT 143, NOTT 145 I 


Duplication 


3031645- 


-3031834 


3233014- 


-3233274 


201 




NOTT 138 I 


Deletion 


124442- 


-124568 


124866- 


-125001 


0.3 


No annotated gene 


NOTT 138 I 


Duplication 


5499761- 


-5500081 


5500384- 


-5502054 


2 


Part of a galactosidase 
(mel1) gene 


CRUK 972 II 


Triplication 


1598396 


-161093 


1603682- 


-1620842 


19 


Dgdh repeats 


NOTT 141, NOTT 143, II 


Duplication 


2036773- 


-2037231 


2041544- 


-2041813 


4 


Includes ccc2-Menkes 


CBS 2777 












disease protein 


NOTT 138, NOTT 141, II 


Deletion 


356108- 


-356292 


359569- 


-360000 


3 


SPBC1271.07E, SPBC1271.08E, 


NOTT 143, NOTT 145, 














mug96 


CBS 2777 
















NOTT 138 II 


Loss of copy 


2114026- 


-2115566 


2116080- 


-2116407 


0.5 


Mating type locus 




number 












CBS 356, NOTT 141, III 


deletion 


1493999- 


-1494480 


1499706- 


-1500311 


5 


Deletion of pseudogene 


NOTT 143, NOTT 145, 














SPCC188.10c-1 


CBS2777 
















NOTT 141, NOTT 143, III 


Duplication 


1893874- 


-1894205 


1926082- 


-1926689 


32 


SPCC737.05-08, hmtl (ABC 


NOTT 145 (4 copies) 














transporter involved in 
response to Cd++), 
mug24, slyl 


NOTT 141, NOTT 143, III 


Duplication 


1174389- 


-1174518 


1 1 80220- 


-1180872 


6 


Includes SPCC4B3.01-1 


NOTT 145 














putative 3-mercaptopyruvate 
sulfurtransferase 


NOTT 141, NOTT 143 III 


Duplication 


381359- 


-383288 


383347- 


-383709 




SPCC1 682.06 


NOTT 138 III 


Deletion 


969398- 


-970117 


971646- 


-971940 


1 


No annotated gene 


CBS 2777, NOTT 138, III 


Deletion 


1645637- 


-1646096 


1648511- 


-1649231 


2 


Deletion of pseudogene 


NOTT 141, NOTT 143, 














SPCC663.07c-1 


NOTT 145 
















CBS 356, CBS 2777, III 


CNV 


2108532- 


-2109334 


2109565- 


-2111026 


1 


wtf22; listed as a pseudogene 



NOTT 1 38, NOTT 1 41 , polymorphic 
NOTT 143 

The indicated strains were analyzed by comparative genome hybridization using Agilent 4 x 44K ChiP-on-chip arrays, and the indicated regions of copy number 
variation were detected. 



hybridization (Figure 5A), and then for two of the karyotypes, NOTT 
143 and NOTT 145, by comparative genomic hybridization (CGH) of 
gel-purified chromosomes to microarrays (Figure 5B), followed by 
PCR analysis and sequencing (File SI) across hypothetical breakpoints 
to produce the maps shown in Figure 5G These demonstrate features 
held in common by the two rearrangements, first they are both trans- 
locations, and second, they both share a 2,227,883 bp pericentric in- 
version with respect to chromosome I of the laboratory strain. PCR 
showed that this inversion is present in all of the strains in the col- 
lection with the exception of DBVPG 2805, DBVPG 6610, DBVPG 
4433, DBVPG 6279, DBVPG 6699, CRUK 972, and CRUK 975. These 
strains belong to haplotype 18 or the closely related haplotype 25. 



However, these strains have a widespread distribution, which sug- 
gested that the laboratory strain arrangement and possibly other rear- 
rangements on a shared background are acting as barriers to fertile 
mating. Sequencing across the inversion and translocation breakpoints 
(File SI, and Figure S5) showed only microhomology in two of five 
sequences involved in the exchanges. Repetitive sequences were ab- 
sent. This fact that the 2.23 Mb inversion was present in a minority of 
strains also suggested that the sequence arrangement in the laboratory 
strain was a derived trait, which we confirmed by comparison with the 
arrangement of the homologous sequences in S. octosporus and 
S. cryophilus (not shown). Pulsed field gel and filter hybridization 
analysis (Figure 5A) indicated that the rearrangements in NOTT 



Figure 5 Rearrangement of the S. pombe karyotype by translocations between chromosomes I and II. (A) DNA extracted from seven different 
natural isolates (CBS 356, CBS2777, NOTT 136, NOTT 140, NOTT 142, NOTT 143, and NOTT 145) and laboratory strain CRUK 972 was size- 
fractionated by pulsed field gel electrophoresis and then analyzed after transfer by filter hybridization with centromere proximal and distal probes 
from each of the six arms of the three chromosomes and with probes for the centromeric dGdH repeat and the central core of chromosome II. The 
figure illustrates the original ethidium bromide-stained gels and the results of the filter hybridization. (B) DNA was extracted from strains NOTT 
143 and NOTT 145, size fractionated by pulsed field gel electrophoresis, and analyzed together with unfractionated DNA by CGH using Agilent 
44K ChIP on CHIP arrays using unfractionated DNA from the laboratory strain CRUK 975 as competitor. In the case of NOTT 1 45, chromosomes 2 
and 3 are very similar in size; there is cross contamination in the CGH experiments, but this does not obscure the details of the translocation 
between sequences that normally reside on chromosomes I and II. (C) Ideogrammatic representation of the CGH results together with an 
indication of the approximate positions of the probes used in the filter hybridization. Also indicated is the position of the 2.3 Mb pericentric 
inversion of sequences with respect to laboratory strain chromosome I. This inversion is present in both NOTT 143, as indicated by the results of 
the CGH experiments, and NOTT 143, as indicated by the results of PCR analysis. For a detailed discussion of the experimental approaches 
needed to define the rearrangements, see File S1. 
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140 and 142 were also simple reciprocal translocations, and they were 
not characterized further. Filter hybridization and CGH microarray 
analysis showed that CBS 356 and NOTT 138 also contained trans- 
locations or transpositions, but in both cases, one of the two partner 
chromosomes was not detectably rearranged when analyzed by CGH 
on microarrays (not shown). The simplest explanation for these 
results is that one of the breakpoints is close to a telomere. 

In addition to the analyses of the chromosomal rearrangements in 
CBS 356, NOTT 138, NOTT 140, and NOTT 142, we carried out 
comparative genomic hybridization (CGH) and microarray analysis 
CGH on unfractionated DNA of strains CRUK 972, NOTT 141, 
NOTT 143, NOTT 145, and CBS 2777 using CRUK 975 DNA as 
a competitor. This CGH analysis detected extensive regions of copy 
number variation (Table 5). Three of the eight regions that are am- 
plified include genes that are involved in resistance to heavy metals, 
consistent with exposure to Bordeaux mixture (CuS0 4 /Ca(OH) 2 ) used 
as a fungicide in vineyards. 

The strain CBS 2777 contained four chromosomes (Figure 5A). 
Preliminary pulsed field gel and filter hybridization analysis of intact 
chromosomal DNA demonstrated a complex pattern of sequence 
rearrangements. Detailed functional and structural analysis of this 
strain will be described elsewhere. 

DISCUSSION 

In many organisms, natural isolates have been a source of variation for 
the analysis of pathways and processes of basic biological interest. In 
the short term, the aim of such studies is to increase knowledge of the 
components of these pathways and processes, whereas in the long 
term, the aim is to understand the architecture of a trait or feature at 
all levels from DNA to phenotype in terms of the ecology of the 
respective organism (Mackay et al. 2009). We initiated the study of the 
variation of S. pombe because this organism is easy and cheap to 
manipulate and analyze and because much is known about the cell 
biology of S. pombe. We showed that the amount of phenotypic 
variation segregating in S. pombe is small in comparison to that seen, 
for example, in S. cerevisiae and that the existence of karyotypic 
rearrangements may limit access to some of it. We conclude that 
the variation that does exist (in size, for example) will need to be 
studied in a focused way. Our pulsed field gel-based approach is likely 
to have revealed the presence of only a subset of the translocations, 
inversions, and deletions segregating in the collection, and thus, a sub- 
set of strains, segregating a trait of particular interest, will need to be 
completely sequenced and characterized in terms of rearrangements. 
In light of the expense required to carry out such work, it would seem 
prudent to start with the most significant of the variable traits and 
characterize the strains segregating this variation first. 

Cell size is a feature of the biology of S. pombe that has been 
intensively studied, is of general biological importance, and shows clear 
evidence of variation among the different isolates. Such variation is also 
observed in budding yeast (Nogami et al. 2007). It may be that this 
variation is adaptively neutral, and the correlation between the size of 
particular haplotype and the source from which it is isolated that is 
observed in haplotypes 3 and 4 of Brazilian origin has arisen by drift. 
Alternatively, the variation may be adaptively significant and related to 
the fact that the strains were isolated from different environments: 
cachaca must (haplotype 3) or the frozen pulp of Eugenia uniflora 
(haplotye 4). A combination of experimental and ecological studies 
should discriminate between these alternative explanations and im- 
prove our understanding of this important aspect of cell physiology. 

Although karyotypic diversity will make it necessary to adapt and 
apply the approaches used to exploit diversity in other model 



organisms to S. pombe, karyotype diversity is itself of interest. Thus, 
an understanding of the karyotype diversity is essential if we are to 
understand the nucleotide diversity because inversion and transloca- 
tions will limit exchange between segments of DNA and strains. Rear- 
ranged chromosomes have been useful as both genetic reagents and in 
terms of their ability to cast new and often unexpected perspectives on 
fundamental problems. Thus the discovery of strain CBS 2777 with 
four chromosomes poses interesting questions about the mechanism 
of formation of additional centromeres and telomeres. The unexpect- 
edly high levels of karyotypic diversity that characterize S. pombe 
worldwide also require explanation in population genetic and mech- 
anistic terms. The simplest explanation, at the population genetic 
level, is that S. pombe exists in small, rarely out-crossing populations 
that favor the accumulation of weakly deleterious mutations, including 
karyotypic rearrangements. This explanation is consistent with both 
the pattern of linkage distribution around CEN2 and TER1 and with 
the observation that most of the rearrangements occur once in the 
collection or are confined to single haplotypes (Table S2). Note, how- 
ever, that our perspective on out-crossing is based on only two regions 
of the genome and the patterns of LD within these regions will in turn 
be sensitive to whether they include any common rearrangements. 
Genome- wide estimates of LD will thus also require a knowledge of 
the karyotype diversity. A second, and not exclusive, adaptive expla- 
nation for the high levels of karyotypic diversity at the population 
genetic level is suggested by the isogenic rearrangements isolated by 
subcloning the African strains Y468, Y470, Y831, and Y832. The fact 
that the substrains are isogenic at the sequence level, vary karyotyp- 
ically, and are derived from the same isolates raises the possibility that 
adaptive radiation may be occurring within in a limited geographic 
area and that the rearrangements are acting to isolate variants adapted 
to different, specific subniches. Although it may be possible to in- 
vestigate such hypothetical microspeciation, a potential limitation of 
this line of work is that the traits under selection may be significant 
only for yeast ecology (e.g. sugar utilization) and not of general in- 
terest as regards eukaryotic cell biology. However, it seems worthwhile 
to study this aspect of the biology of S. pombe because speciation is 
a problem of general interest that remains incompletely understood, 
and S. pombe offers the possibility of rigorous analysis and under- 
standing. In this respect, our population level approach extends the 
comparative genomics analysis of four different Schizosaccharomyces 
species by Rhind et al. (2011). Deeper understanding at both levels 
should help cast light on the mode and tempo of speciation in Schiz- 
osaccharomyces spp. and, in particular, on the relative significance of 
karyotypic and gene incompatibility as isolating mechanisms in this 
genus (Greig 2009). 

It is clear from our analysis that our understanding of the origins 
and spread of S. pombe would be improved not only by more se- 
quence data and data about the rearrangements segregating in the 
population but also by the collection and analysis of many more 
strains from around the world. S. pombe will grow at high concen- 
trations of glucose and at low pH. An enrichment medium for S. 
pombe has been described (Florenzano et al. 1977). Thus, it may be 
possible to collect many more strains, although the ease with which 
this might be done is not yet clear. In particular, more strains from 
Africa and South and East Asia will be necessary to establish the 
geographic origins and phylogeny of the species. The success of the 
Brazilian collection expeditions suggests that it should be possible to 
isolate such strains. This work should also help define the range of 
environments that harbor S. pombe, provide additional karyotypic 
variants, and perhaps identify convenient locations for field work. 
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